Disk scheduling system with bounded request reordering

ABSTRACT

A disk scheduling system with bounded request reordering. Disk access requests may be performed during traversals of a disk head across a disk. Each traversal may have a specified direction of motion. A plurality of disk accesses may be performed during a disk head traversal. The overall number of disk access requests for a given disk head traversal may be limited to a maximum number N. By limiting the number of disk requests for each traversal, a bound may effectively be placed on the amount of time it takes to satisfy any single disk request.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention relates to computer data storage and serversystems, and more particularly to digital video/audio storage andplayback systems supporting multiple continuous media streams.

[0003] 2. Description of the Relevant Art

[0004] Multimedia or video server systems are used in a variety ofapplications for the storage and playback of video, audio or othermultimedia data streams. For example, multimedia servers may be used forbroadcast, cable or satellite solutions to distribute multimediainformation to clients or consumers. Professional broadcasters andassociated service providers, such as networks and affiliates or cableproviders, may employ digital video servers to support high bandwidthmultimedia broadcast applications including multi-channel programplayout, ad insertion, and digital content management. Otherapplications for multimedia server systems may include computer-basedtraining in which multimedia training materials or lectures may bestored on the server system accessed by students over a network or theinternet.

[0005] Video archiving, browsing and retrieval is another multimediaserver application. Various movies may be stored by the server anddistributed to users upon request. Video-on-demand or video deliverysystems may enable a plurality of users or viewers to selectively watchmovies or other audio/video sequences which are stored on one or morevideo servers or media servers. The video servers may be connectedthrough data transfer channels, such as a broadcast cable system,satellite broadcast system or the internet, to the plurality of users orsubscribers. The video servers may store a plurality of movies or otheraudio/video sequences, and each user can select one or more movies fromthe video servers for viewing. Each user may include a television orother viewing device, as well as associated decoding logic, forselecting and viewing desired movies. When a user selects a movie, theselected movie may be transferred on one of the data transfer channelsto the viewing device of the respective user. Multimedia servers arealso found in webcasting applications in which entertainment may bemulticast on the internet to different subscribers. Multimedia serversare found in numerous other applications as well.

[0006] To meet the demands of many different applications and users, itis desirable for a multimedia server system to provide flexibility andextensibility. Two important requirements for a multimedia server systemare storage space and file system bandwidth. Multimedia data, such asfull-motion digital video, requires a large amount of storage and datatransfer bandwidth. Thus, multimedia systems use various types of videocompression algorithms to reduce the amount of necessary storage anddata transfer bandwidth. In general, different video compression methodsexist for still graphic images and for full-motion video. Videocompression methods for still graphic images or single video frames maybe intraframe compression methods, and compression methods for motionvideo may be interframe compression methods.

[0007] Examples of video data compression for still graphic images areRLE (Run-Length Encoding) and JPEG (Joint Photographic Experts Group)compression. Although JPEG compression was originally designed for thecompression of still images rather than video, JPEG compression is usedin some motion video applications. Most video compression algorithms aredesigned to compress full motion video. Examples of video compressiontechniques are MPEG (Moving Pictures Experts Group), MPEG-2, DVI(Digital Video Interactive) and Indeo, among others.

[0008] Even with the use of compression techniques, multimediaapplications may still require extremely large amounts of storage. Forexample, two hours of video encoded at 1 Mb per second may requireroughly one gigabyte (1 GB) of storage. A system supporting numerousdifferent content may require up to several terabytes (TB) of storage.The server system must also be able to provide enough bandwidth for thevarious users to access selected multimedia content without overloadingthe storage system. For example, to support 100 simultaneous subscribersviewing multimedia content encoded at 1 Mb per second, a server may needto support a bandwidth in excess of 100 Mb per second when allowing foroverhead. If enough bandwidth is not available, then some requests mayhave to be denied, or the play quality may suffer (video may run tooslowly or may appear “jerky”). To meet such storage and bandwidth needs,a multimedia server may utilize one or more RAID (Redundant Array ofInexpensive Drives) storage systems. In a RAID system, for a givenmultimedia file, blocks of multimedia data may be stored across multiplehard disk units. The blocks may be read out or transferred to thecommunication network and transmitted or broadcast to the user or users.At the receiving end the blocks may be decoded for user viewing on adisplay device.

[0009] The disks of each hard disk unit may also be considered as beingdivided into zones. Since they are physically larger, tracks in zones atthe outer disk contain more sectors than tracks in zones near therotational axis of the disk. Therefore, assuming the disks rotate with aconstant velocity, the data bandwidth available from the outer mostzones is greater than the data bandwidth available from the innermostzones. Even with modern hard disk drives, there can be a 2-1 variationbetween worst case and average case disk transfer bandwidth due tosectors/track variations between outer and inner zones.

[0010] Many multimedia applications require continuous media streams inwhich data streams must be delivered at a specified and possiblytime-varying data rates and with a specified uniformity of that deliveryrate. In some cases, the uniformity of the delivery rate may beadversely affected by the algorithm used to satisfy disk accessrequests. The use of a “first-come, first-served” disk access algorithmmay not always be the most efficient way to satisfy disk requests, asmotion of the read-write head (used to access information from the disk)may be less than optimal. Some optimization of head motion may berealized through the use of algorithms that re-order the disk requests.In such re-ordering algorithms, disk requests may be satisfied in anorder different from the order in which they were made. One suchre-ordering algorithm is known as an “elevator” algorithm. In onetypical elevator algorithm, the head of the disk storage system sweepsfrom the outer disk to the inner disk, satisfying and queued diskrequest along the way, and then reversing direction. While thisalgorithm may allow for more efficient motion of the read-write head,highly non-uniform access times may still be present, as newly arrivingrequests may be satisfied prior to previously queued requests. A largenumber of newly arriving requests may cause long delays in satisfyingpreviously queued requests.

[0011] Non-uniform disk access times may be detrimental to manyapplications, particularly multimedia applications. For example, videoplayback from a disk storage system may appear erratic when disk accesstimes are non-uniform. Audio playback may be affected in a similarmanner. As such, the quality of a multimedia presentation accessed froma disk storage system with non-uniform access times may suffer.

SUMMARY OF THE INVENTION

[0012] The problems outlined above may in large part be solved by asystem and method of bounded disk request reordering in accordance withthe present invention. In one embodiment, disk access requests may beperformed during traversals of a disk head across a disk. Each traversalmay have a specified direction of motion. A plurality of disk accessesmay be performed during a disk head traversal. In some cases, diskaccesses may be performed in an order different from the order in whichthe original disk access requests were received. The overall number ofdisk access requests for a given disk head traversal may be limited to amaximum number N. By limiting the number of disk requests for eachtraversal, a bound may effectively be placed on the amount of time ittakes to satisfy any single disk request, despite any reordering. Diskhead motion may be optimized as well.

[0013] In a further embodiment, a disk storage system maintains a listof disk head traverses, known as a traversal list. Each traverseincludes several components. The first component of a traverse is avariable for the direction of disk head motion for a given traverse, andmay be given a value of “low-to-high” or “high-to-low”. In effect, thisvariable determines whether a given traverse will read from the outerportion of the disk to the inner portion, or vice versa. The secondcomponent of the traverse is an ordered list of disk access requests(the disk request list) which are to be satisfied during the giventraverse. The third component of a traverse is a variable indicating thenumber of disk requests in the disk request list. This variable isbounded to a maximum value (“N”) in order to limit the number of diskrequests that may be satisfied for a given traverse. A fourth componentof a traverse is the Boolean variable “Active”. The active variable maybe set to a value of false prior to conducting the traverse, and maybecome true when the traverse is in effect. The final component of atraverse is the current disk block address, or the disk address at whichthe disk head is located at a given instant in time. Since the directionof motion of the disk head alternates with each new traversal, thenumber of traverses in the traversal list may be constrained to be even.

[0014] The system of one embodiment may perform two algorithms, aqueuing algorithm for queuing incoming disk requests, and an executionalgorithm for satisfying the queued requests. The queuing algorithmperforms the function of placing a newly arrived disk request into atraverse of the traversal list. The newly arrived request may be placedinto the disk request list of an active traverse (active=true) or apending traverse (active=false). The execution algorithm carries out thequeued requests of each traverse of the traversal list.

[0015] The structure of the algorithms may allow for optimization ofdisk head motion and more uniform disk access times, despite anyreordering. Since the number of disk requests for a given traverse isbounded by a maximum value (“N”), the amount of time to satisfy a givendisk request may be bounded as well. In effect, the system utilizes anelevator algorithm with a bounded maximum delay for a given diskrequest.

[0016] Thus, in various embodiments, the system and method of boundeddisk request reordering may allow disk requests to be reordered andsatisfied within specified bounds. This may result in an optimization ofdisk head motion, and furthermore, allow for more uniform disk accesstimes. The uniformity of disk access times may make the system moresuitable for certain applications in which a relatively steady datastream is required. As such, the system may be particularly suited foruse with various multimedia applications.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017] Other objects and advantages of the invention will becomeapparent upon reading the following detailed description and uponreference to the accompanying drawings in which:

[0018] Other objects and advantages of the invention will becomeapparent upon reading the following detailed description and uponreference to the accompanying drawings in which:

[0019]FIG. 1 illustrates a constant time/variable data rate dependentdata placement/scheduling mechanism;

[0020]FIG. 2 is an illustration of a video server and storage system;

[0021]FIG. 3 is an illustration of a distributed multimedia file systememploying a number of video servers and files systems;

[0022]FIG. 4 is a detailed diagram of a video storage manager;

[0023]FIG. 5 illustrates one example of a constant data, variable timerate-independent placement mechanism of the video storage manager fortwo simultaneous continuous media streams;

[0024]FIG. 6 is a flow chart illustrating a constant data, variable timeaccess mechanism employing buffer rings and deadline queues;

[0025]FIG. 7 illustrates a system which provides for both guaranteedrate streams and non-rate-guaranteed available rate accesses;

[0026]FIG. 8 illustrates an example of a cycle by which requests aremigrated from the deadline and priority queues to the storage system;

[0027]FIG. 9 is a flow chart illustrating a method for providing storageaccess for multiple continuous media streams with a rate guarantee andstorage access for non-rate guaranteed requests;

[0028]FIG. 10 illustrates a video storage manager combining mechanismsillustrated in FIGS. 4 and 7;

[0029]FIG. 11 is a flow chart illustrating operation of the seek reordershown in FIG. 10;

[0030]FIG. 12 is a flowchart illustrating storage characterization foradmission control;

[0031]FIG. 13 is a flow chart illustrating determination of the optimumnumber of buffers for a buffer ring for a variety of stream rates;

[0032]FIG. 14 is a flow chart illustrating a method of scheduling diskaccess requests in a traversal list for one embodiment;

[0033]FIG. 15 is an example of one embodiment of a traversal list whichmay be used for scheduling disk access requests using the method of FIG.14; and

[0034]FIG. 16 is a flow chart illustrating one embodiment of a method ofexecuting the disk access requests scheduled using the method in FIG.14.

[0035] While the invention is susceptible to various modifications andalternative forms, specific embodiments thereof are shown by way ofexample in the drawings and will herein be described in detail. Itshould be understood, however, that the drawings and description theretoare not intended to limit the invention to the particular formdisclosed, but, on the contrary, the invention is to cover allmodifications, equivalents, and alternatives falling with the spirit andscope of the present invention as defined be the appended claims.

DETAILED DESCRIPTION OF THE INVENTION

[0036] Referring now to FIG. 2, a video server and storage system 200 isillustrated. System 200 includes server 202 and storage systems 204. Thestorage systems 204 may be connected to the server 202 by one or morebuses 205. The server may include one or more processors (not shown)which may communicate with the storage systems 204 via a peripheral bus,such as one or more PCI buses and one or more SCSI interfaces. Theserver 202 may also include a number of codecs for encoding and decodingmultimedia data streams. The codecs may also be coupled to one or morePCI buses. Each storage system 204 may include one or more RAID systemsas shown.

[0037] In order to support multiple continuous media streams in whichdata streams are delivered at a specified and possibly time-varying datarate, the server 202 includes a video storage manager 206. The videostorage manager controls the storage and access of multimedia streams onthe storage systems 204. In a preferred embodiment, multimedia files arestored via the video storage manager 206 in high quality MPEG-2 format,although other suitable compression formats may be used. Clients orrequesters for a multimedia stream contract with the video storagemanager 206 for access to a file at a desired bit rate. The videostorage manager 206 assesses available storage bandwidth and availablebuffer memory to determine whether or not the request can be met. Oncethe video storage manager has established that the request can beaccommodated, the client is given access to the file at any bit rate upto the contracted rate. If the request exceeds available storagebandwidth and/or buffer memory is exhausted, the video storage managermust reject the request and the client is free to adjust and/or resubmitthe request at a later time. By providing a guaranteed stream rate thevideo storage manager fully supports variable bit rate accesses inaddition to constant bit rate accesses. A client may arbitrarily varythe rate of access to a file from zero bits per second to any point upto the contract rate. This flexibility supports a number of featuresincluding frame accurate initiation and jog/shuttle functionality.

[0038] Multiple different clients may request different streams atdifferent bit rates from the video storage manager. These streams may bean arbitrary mix of reads, writes, stream rates and files accessed. Eachstream may have a different contract rate and an individual stream mayarbitrarily range in rate up to the contract rate wherein the totalaggregate for all stream rates does not exceed the total aggregatestreaming capacity of the server system. There is no requirement thatall streams be of the same bit rate, or that the bit rate of a stream bechosen from a set of discrete allowable rates. The video storage manageralso permits clients to access the same files, different files, or anycombination in-between. As will be described below, the video storagemanager provides this flexibility without impacting on server aggregatebandwidth.

[0039] Turning now to FIG. 3, a distributed multimedia file system 300is illustrated employing a number of video servers 202 and files systems204. In this embodiment the files systems 204 communicate with videoservers 202 via fibre channel. Each storage system 204 may include anumber of RAID systems linked on a fibre channel arbitrated loop(FC-AL). Each video server 202 may also connect to its own local filesystem or tape library, for example. In addition, other storage systems,such as a tape library, may be accessible to the system on the fibrechannel. Clients may request multimedia streams to be sent ontransmission network 208. Transmission network 208 may be a computernetwork, the internet, a broadcast system or any other suitabletransmission medium for multimedia streams. A video storage managerexecuting on one or more of the video servers controls the initiationand addition of multimedia streams for accessing files on storagesystems 204. The video storage manager manages multiple continuous mediastreams to be delivered through a wide range of hardware interfaces,such as MPEG encoders and decoders, DVB multiplexors, ATM, SONET, andethernet, to transmission network 208.

[0040] The video storage manager as employed in systems such asillustrated in FIGS. 2 and 3, addresses how to schedule disk or storageaccesses for multiple continuous sequential media streams in a mannerthat guarantees data for all continuous media streams and provides anaccurate mechanism for determining whether a new request for guaranteedrate access can be accommodated.

[0041] Turning now to FIG. 4, a detailed diagram of a video storagemanager 206 is shown. The video storage manager 206 includes a requestprocessor 402 which interfaces client requests to stream managers 404.Each stream manager 404 maintain a buffer ring 405. A separate streammanager 404 corresponds to each continuous multimedia stream. A filesystem 406 is provided for mapping stream accesses to the storagesystems 204. Disks schedulers 408 are provided for each storage system204 to manage to flow of storage accesses to each storage system. Eachdisk scheduler may include a deadline queue for 410, as described inmore detail below.

[0042] The video storage manager, file system, and disk scheduler placestream data on the storage systems in a manner that is completelyindependent of the inherent bit rate of that material. This featureprovides for additional flexibility in that clients may transfer contenton and off the video storage manager file system with guaranteed rateservice at data rates many times higher (or lower) than the inherentrate of the stream data. The video storage manager, file system, anddata placement mechanism is a fixed block size mechanism. For example,data is transferred to or from the storage systems in a constant blocksize. In a preferred embodiment a block size of 256 kilobytes may bechosen. The video stream manager may provide for configuration of theblock size during system initiation or configuration. The fixed blocksize mechanism ensures that no external fragmentation of storage occursand that internal fragmentation occurs only at the last block of thefile (since a file is unlikely to end exactly at a block boundary).Unlike rate-dependent, variable block size mechanisms, which suffer fromboth external fragmentation and varying levels of per block internalfragmentation that results in great variations and storage requirementsfor a particular file depending on stream rate and current file systemcontents, the video storage manager's rate independent fixed block sizemechanism ensures predictable storage requirements for any fileregardless of rate or current file system contents.

[0043] Turning briefly to FIG. 5, one example of the constant data,variable time rate-independent placement mechanism of the video storagemanager is illustrated for two simultaneous continuous media streams. Asshown, the data block size is fixed for all media streams, but the timeat which a data block is accessed varies for each stream according tothe desired bit rate.

[0044] One problem that arises from a constant data (fixed block),variable time access scheduling mechanism is that multiple streams, eachwith its own frequency and phase of storage accesses, make requests tothe storage system and the interaction of these access patterns resultsin peaks and lulls in the storage activity. The different frequency andphases of storage accesses by the different streams results in times inwhich numerous accesses may be pending at once and other times in whichvery few accesses may be pending. One solution to this problem is tosimply require the storage systems to support the peak rate of activity,however, this solution is clearly not cost effective.

[0045] Referring back to FIG. 4, the virtual storage manager of thepresent invention addresses the above-noted problem by leveling storageactivity by introducing a ring of buffers between each client and thefile system. Each media stream is associated with a different bufferring 405 managed by a stream manager 404. Thus, the stream manager 404associates a ring of data buffers between the requester of continuousmedia and the disk subsystems. The number of buffers in a ring isdetermined according to the contracted guarantee rate of the associatedmedia stream and characteristics of the storage system so that theguaranteed rate is always met. The buffer rings 405 exploit the factthat video streaming is inherently sequential and lets the file systempre-queue storage requests. This approach allows future requests to besatisfied during lulls, shifting the load from peaks to valleys andsmoothing storage activity over time.

[0046] Each ring 405 of N buffers is used to hold the next N blocks ofthe continuous media stream to be accessed by the requester. Once abuffer in the ring has its data consumed by the requester, an access tofill the now empty buffer is queued to the appropriate disk scheduler408 in order to fill the empty buffer with the next block for the mediastream. Requests to fill (or empty) buffers of buffer rings 405 aremapped by file system 406 to the appropriate disk scheduler 408. Filesystem 406 maps logical blocks to physical blocks in the storage systems204. The file system 406 may maintain a map of logical to physical blocklocations (e.g. an inode). Because requests for multiple streams may bequeued in each disk scheduler 408, the system must ensure that futurerequest from one stream are not fulfilled before more urgent requestsfrom another stream so that the guaranteed rate may be maintained foreach stream. To accomplish this goal deadlines are associated with eachrequest submitted to the storage a system. The system calculates thedeadline to coincide with the time a buffer will be needed by noting thestream rate block size and the number of existing unconsumed buffers.When a request for an empty buffer is queued, a deadline time is queuedwith the request in the appropriate deadline queue 410 in the diskscheduler 408. The deadline time indicates the latest time when thebuffer can be filled and still meet the guaranteed rate requirement ofthe particular stream. The deadline time is calculated as:current_time+(N−1)*buff_time, where N is the number of buffers in thebuffer ring 405 and buff_time is the minimum time in which a requestorcan consume a buffer without exceeding the contracted rate guarantee.The disk scheduler 408 must now issue the queue request to theparticular storage system 204 in an order which meets the deadlinesassociated with the requests. The disk scheduler places requests fromcontinuous media requesters into each deadline queue 410 and maintainsan order of earliest to latest so that requests with the earliestdeadline are satisfied first.

[0047] In order for the system to meet a stream's deadline it must setup a sufficiently large buffer ring to ensure that any request can bequeued with the storage system far enough in advance of its deadline sothat the worst possible service time for the request will not exceed thedeadline. Because worst case service time is a function of the aggregateload on the system, and the aggregate load is a direct result of theaggregate stream rate (independent of the actual stream rate mix),buffer ring size for a particular stream on a given storage system is afunction of that particular stream's rate and is independent of thestream rates of other streams in the mix. Given this independence,appropriate ring sizes for various stream rates may be generated atstorage characterization time as detailed further below.

[0048] Turning now to FIG. 6 a flow chart is provided illustrating theconstant data, variable time access mechanism employing buffer rings 405and deadline queues 410. When a new stream is initiated, the streammanager for the new stream determines the guaranteed stream rate and theblock size for the stream as indicated at 602. The stream is attached tothe requested file through the file system 406 and the stream manager404 creates the buffer ring 405 for the new stream. Requests for blocksfrom the associated file are then issued to the appropriate storagesystems to fill the buffer ring. Each buffer may be sized for one block.After the buffer ring is filled (606) streaming may begin as indicatedat 608.

[0049] As each buffer is consumed by the stream requester, a blockrequest is issued along with a deadline time to fill the now consumedbuffer, as indicated at 610. The block request and deadline time arequeued in the deadline queue 410 for the appropriate storage systemaccording to where the requested block is located. The requests areordered in the deadline queue from earliest to latest deadline time.Requests are issued from the deadline queue according to the earliestdeadline as indicated at 612. During streaming the buffers of the bufferring are accessed one after another in a circular manner. The deadlinetime assures that each buffer is filled before it is needed by thestream requester according to the guaranteed rate. The buffer ring andassociated deadline times take advantage of the inherently sequentialnature of multimedia streaming to pre-queue storage requests. Thisallows future requests to be satisfied during lulls of storage activitythus shifting the load from peaks to valleys and smoothing storageactivity over time. Note that while FIG. 6 has been described in termsof stream read requests, the same mechanism may be employed for writestream requests. As each buffer is filled with a block of stream data arequest and deadline may be queued in a deadline queue to write theblock into the storage system.

[0050] The video storage manager 206 supports a plurality of differentmedia stream clients at different rate guarantees. A different mediastream manager 404 and ring buffer 405 may be provided for each stream.A separate disk scheduler 408 and deadline queue 410 are provided foreach storage system 204. Thus, each deadline queue 410 may includerequests corresponding to several different media streams. The deadlinetimes for each request in the deadline queues 410 are all calculatedrelative to a common current time so that the earliest deadline from anyrequester stored in a particular deadline queue is issued first. Thetime between requests being satisfied for any particular stream variesdepending upon the number of other pending requests, however, theassociated deadline time assures that the rate guarantee will be met.

[0051] In addition to providing for rate guaranteed continuous mediastreams, it may be desirable for a multimedia server to provide accessto data stored in the storage systems in a prioritized but non-lateguaranteed manner. Such accesses should not impact the guarantees madefor the continuous rate-guaranteed media streams. For example, an NFS orFTP requester may wish to access a file. Typically such accesses arenon-real-time and no rate guarantee is required. Such accesses may besatisfied using residual disk bandwidth available after all guaranteedrate accesses are satisfied. Any storage bandwidth that remains afterall guaranteed rate requests have been met is allocated to a generalpool. Available bandwidth clients may access this bandwidth on a firstcome, fist served basis. The video storage manager dynamicallydetermines the amount of available bandwidth. Any bandwidth from anunused guaranteed rate contract may become part of the pool of availablebandwidth.

[0052] Turning now to FIG. 7 a system is illustrated which provides forboth guaranteed rate streams and non-rate-guaranteed available rateaccesses. As shown in FIG. 7 the video storage manager 206 may acceptrequests from both guaranteed rate clients and available rate clients. Astream buffer 712 may be associated with each guaranteed rate client. Ina preferred embodiment, each stream buffer 712 is a buffer ring asdescribed in regard to FIGS. 4 and 6. Guaranteed rate requests aremapped by file system 406 to an appropriate disk scheduler 408 andqueued in a guaranteed rate queue 706. In a preferred embodiment theguaranteed rate queue is a deadline queue as described in regard toFIGS. 4 and 6. Available rate requests that are non-rate guaranteed arealso mapped by file system 406 to the appropriate disk scheduler for thestorage system in which the requested data is located. A data pool 704may be provided as a shared buffer for the available rate requests.Available rate requests are queued in a priority queue 708 associatedwith each storage system. Another source of file requests may be thefile system 406 itself. These requests may include requests for metadatarequired to support the various data streams (e.g. blocks that holdslists of blocks to stream, such as indirect blocks). These type ofmetadata requests may be time critical in that streaming will stop if astream pointer block (indirect block) pointing to the next data block tothe stream is unavailable. Thus, request for time critical metadata alsocarry deadlines and may be scheduled directly along with streaming datarequests in the guaranteed rate or deadline queue 706. The file systemconstantly monitors its progress by means of the current indirect block.At an appropriate threshold it calculates a deadline and schedules thefetch of the next indirect block from the storage system. Other metadatarequests may be non-critical such as other types of file management andread and write operations unrelated to streaming (e.g. listing files inthe file system). These non-time-critical metadata requests are queuedin the priority queues 708. A metadata pool 702 may be associated withfile system 406 from which the metadata requests are issued.

[0053] Although other metadata requests and available bandwidth requestsdo not have strict service time requirements, they may have a priorityrelationship. For example, metadata writes may be considered the highestpriority because their completion may be essential for closing aparticular stream episode. Metadata reads may be next in priority toensure timely processing of file lists, file creations, etc. AvailableI/O requests may have the lowest priority and may be filled whenresources are available. Requests in the priority queues are orderedfrom highest to lowest priority.

[0054] The disk scheduling mechanism issues the queued requests to thestorage system in an order which meets the deadlines associated with therequests and also allocates residual bandwidth after guaranteed requeststo non-guaranteed requests in a manner consistent with their associatedpriorities. A bandwidth allocator 710 may be employed to allocate acertain portion of storage bandwidth to guaranteed rate requests and theremaining bandwidth portion to non-guaranteed priority requests. Atstorage characterization time a configurable percentage of a storagesystem's bandwidth is reserved for honoring the non-guaranteed priorityrequests. For example, 90 percent of the bandwidth may be reserved forthe guaranteed rate requests from guaranteed rate queue 706 and theremaining 10 percent allocated to non-rate guaranteed requests frompriority queue 708. Based on the percentages reserved for guaranteed andnon-guaranteed requests, the disk scheduler chooses a request from oneor the other queue to hand off to the operating system to be satisfiedfrom the storage system. When the chosen request queue is empty, thescheduler attempts to de-queue a request from the other queue thusallowing both non-guaranteed and guaranteed requests to absorb unusedstorage bandwidth.

[0055] In a preferred embodiment requests are migrated from the deadlineand priority queues to the storage system according to a cycle. Anexample of a cycle is shown in FIG. 8. A cycle is of a fixed number ofslots with each slot assigned to either the deadline queue or priorityqueue in proportion equal to the desired allocation of disk bandwidthbetween guaranteed and non-guaranteed accesses. In FIG. 8, slots markedwith a D point to the deadline queue and slots marked with a P point tothe priority queue. The slot is repeatedly traversed and a request ischosen from one of the queues according to the current slot. In theexample of FIG. 8, the bandwidth is proportioned so that the diskscheduler will first look to the deadline queue for 13 out of every 16storage accesses and first look to the priority queue for the remainingthree out of every 16 accesses. This allocation is merely one exampleand in a preferred embodiment the allocation may be nine out of tenslots pointing to the deadline queue and one out of every ten slotspointing to the priority queue. In a preferred embodiment the slotsallocated to each use are as evenly distributed as possible throughoutthe cycle.

[0056] In a preferred embodiment requests from the deadline and priorityqueues are migrated to the storage system according to the current slotsand the cycle then advances to the next slot. If the queue indicated bycurrent slot is empty then an entry from the other queue is chosen if itis not empty. Therefore, non-rate guaranteed requests may actuallyachieve more when their allocated bandwidth if the full rate guaranteebandwidth through the deadline queue is not being utilized.

[0057] Turning now to FIG. 9 a flow chart is provided illustrating amethod for providing storage access for multiple continuous mediastreams with a rate guarantee and storage access for non-rate guaranteedrequests. A portion of the storage bandwidth is allocated to rateguaranteed requests and the residual bandwidth is allocated to non-rateguaranteed requests, as indicated at 902. Rate guaranteed requests arequeued in a guaranteed rate queue and non-rate guarantee requests arequeued in a priority queue, as indicated at 904. The rate guaranteedrequests are entered into and issued from the rate guaranteed queue in amanner to ensure that they are satisfied in a timely fashion to meet theparticular rate guaranteed for each stream. The non-rate-guaranteedrequests may be ordered in the priority queue so that higher priorityrequests are satisfied before lower priority requests. The system thenselects a queue to issue a request to the storage system according to acurrent slot from a cycle that proportions the storage bandwidthaccording to the bandwidth allocation, as indicated at 906. If theselected queue contains an entry, then that request is issued from theselected queue, as indicated at 908, 910 and 912. If the selected queueis empty, then the system looks to the other queue for a request toissue, as indicated at 908 and 914. If the other queue is not empty,then an entry is removed and issued, as indicated at 916 and 912. Thesystem then traverses the cycle to the next slot, as indicated at 918,and repeats the queue selection process. If the other queue is empty at914, the process is repeated until a queue is found containing an entry.In one embodiment, the slot is not advanced if both queues are empty.Alternatively, the slot may be advanced if both queues are empty.

[0058] Turning now to FIG. 10 a video storage manager is illustratedcombining the mechanisms as discussed in regard to FIGS. 4 and 7. Thestorage manager of FIG. 10 supports multiple continuous media streams inwhich clients contract for access to a file at a guaranteed bit rate.Each stream client is allowed to vary the rate of its access to its filefrom any rate up to the guaranteed rate. In addition, the storagemanager of FIG. 10 support available bandwidth clients. A certainportion of the storage bandwidth is allocated to available bandwidth ornon-rate guaranteed clients, such as available rate client 752. Inaddition, any bandwidth not used by the guaranteed rate clients may beavailable for the available rate clients. Thus, the video storagemanager of FIG. 10 may support any mix of guaranteed rate clients whiledelivering the same aggregate bandwidth and also support available rateclients at a non-guaranteed rate.

[0059] As discussed in regard to FIG. 4, each guaranteed rate clientcommunicates with an associated stream manager 404 which maintains abuffer ring 405 for the particular stream. The buffer ring is used tohold the next N blocks of the continuous media stream to be accessed bythe requester where N is the number of buffers in the buffer ring. Eachbuffer may be sized equally for one block of data per buffer. Once abuffer in the ring has its data consumed by the requester, a request forthe now empty buffer along with a deadline time its queued with theappropriate disk scheduler 408 as determined by file system 406. Thedeadline time indicates the latest time when the buffer request can besatisfied and still meet the guaranteed rate requirement of the stream.The deadline time may be calculated as:

deadline_time=current_time+(N−1)*buff_time

[0060] where N is the number of buffers in the ring and buff_time is aminimum time in which the requester can consume a buffer withoutexceeding it contracted rate guarantee. Simultaneously with guaranteedrate request being queued with the appropriate disk scheduler 408,prioritized but non-guaranteed rate request are also queued.Non-guaranteed rate request do not carry deadlines but do carrypriorities. The disk schedulers issue the queued requests to the storagesystems in an order which meets the deadlines associated with therequests while obtaining a high proportion of the disk system bandwidthand allocating residual disk bandwidth after guaranteed requests tonon-guaranteed requests in a manner consistent with their priorities.

[0061] Guaranteed requests from continuous stream requesters are placedinto an earliest deadline ordered queue 410 in the appropriate diskscheduler. Non-guaranteed rate request are placed into a separatehighest priority ordered queue 708. In addition to request fromavailable rate clients 752 and guaranteed rate clients 754, requests mayalso come from the file system itself Some requests from the file systemmay be time critical such as request for blocks that contain pointers tofuture stream blocks. Deadlines are associated with these requests andthey are inserted in the appropriate deadline queue 410. Other requests,such as non-time critical file management requests, are assigned apriority and inserted in the appropriate priority queue 708. The filesystem requests may be buffered in a meta pool 702. Available rateclient request may be buffered in a data pool 704.

[0062] Requests are migrated from the deadline and priority queues by abandwidth allocator 710, according to a cycle which allocates bandwidthaccording to a configurable allocation. For example, 90 percent of aparticular storage system's bandwidth may be assigned to the deadlinequeue and thus guaranteed rate stream clients, and 10 percent assignedto the priority queue for available rate clients. The bandwidthallocator 710 may migrate requests from the deadline and priority queuesto a seek reorder queue 750. Request may be reordered in the seekreorder queue according to the position of the requested data block onthe storage device. The seek reorder queue may have a configurablemaximum size. Requests from the deadline and priority queues aremigrated to the seek reorder queue according to the current cycle slotwhenever the seek reorder queue is not filled to its maximum size. Eachmigration is done from the queue indicated by the current slot of thecycle and then the cycle advances to the next slot. If the queueindicated by the slot is empty, then an entry from the alternate queueis chosen if it is non-empty. The migrated entry is reordered in theseek reorder queue such that all requests to one side of the entry referto data blocks with storage addresses greater than or equal to itsaddress and all entries on the other side of the queue request datablocks with disk addresses less than or equal to its address.

[0063] Each seek reorder queue 750 is concurrently traversedcontinuously in one direction (i.e., in increasing or decreasing diskaddresses) until no further entries exist in the queue in that directionand it then reverses direction and resumes. Thus, the disk schedulerissues requests from the seek reorder queue to the storage system inorder of disk addresses and advances to the next request when thepreviously issued request has been completed by the disk system.

[0064] Because the deadline and priority queues contain requests frommany different streams and clients, the sequence of blocks resultingfrom these queues is essentially random. If these requests whereserviced according to their order in the deadline and priority queues,excessive disk seek overhead would result from the random pattern ofrequests. The seek reorder queue 750 improves seek time by reorderingrequest out of the deadline and priority queues according to their diskposition.

[0065] Turning now to FIG. 11, a flow chart is provided illustratingoperation of the seek reorder queue 750. As indicated at 1102, when theseek reorder queue is not full, a request is migrated from either thedeadline or priority queue according to the current cycle slot. If theindicated queue is empty, the request is taken from the alternate queueif that queue is non-empty as indicated at 1104. The migrated request isinserted into the seek reorder queue according to the disk address ofthe requested block so that requests in the seek reorder queue areordered by increasing or decreasing disk addresses. Simultaneously, theseek reorder queue is traversed in one direction and the next request isissued to the disk system as indicated at 1108. If the end of the seekreorder queue has been reached then the direction of queue traversal isreversed as indicated at 1110 and 1114. If the end of the seek reorderqueue has not been reached, then the current traversal direction ismaintained as indicated at 1110 and 1112. Once the current request hasbeen satisfied by the disk system, the next request in the seek orderqueue is issued to the disk system as indicated at 1116 and 1108.

[0066] As noted earlier, block requests as viewed by the storage systemare inherently random because the storage system is presented withrequests from many streams. Given this randomness it would beinefficient to sequentially allocate blocks for a particular file.Because I/O cylinders of a disk often have different transfer rates,block allocation within a particular file bounces back and forth betweenI/O zones of the disk. Thus for any particular stream file, blockstorage request are assigned disk addresses so that the blocks will belocated in alternating I/O zones of the disk. This ensures that allfiles see an average storage throughput and that no file being streamedcould end up coming entirely from a low performance zone of the disk.

[0067] As mentioned above, the video storage manager must controladmission of new continuous streams to ensure that the aggregate of theguaranteed stream rates does not exceed the aggregate storage bandwidthallocated for continuous media streams. Before any streaming is begunthe storage systems are characterized to determine their performance orbandwidth. Once a storage system bandwidth has been determined, thenwhen streaming begins, as each new stream is requested the video storagemanager determines whether or not the requested bit rate would exceedthe remaining available bandwidth allocated for continuous streams. Ifso, the request is denied and the requester is free to resubmit therequest at a later time or with a lower bit rate request. If sufficientbandwidth exits the request is granted and a stream manager creates anassociated buffer ring as discussed above.

[0068] Because a sequence of requests presented to the storage systemwhile streaming is essentially random, modeling the stream load tocharacterize storage bandwidth may be simplified. This performance maybe characterized with a synthetic load that reflects the characteristicsof a typical load. The synthetic load may vary from a purely randomsequence of blocks to take into account the fact that blocks for anygiven file may be placed in alternating I/O disk zones. Thus arepresentative load may be constructed by constraining the file systemto allocate sequential blocks in a zoned random manner. The disk blockaddress range may be divided into two halves and sequential file blockallocations may be chosen from random positions within a zonealternating between the two zones. Disk performance may be characterizedusing this synthetic load and then de-rated to provide margin. Theamount of de-rate may be referred to as the primary de-rate parameter.The de-rated bandwidth value is then multiplied by the fraction of thetotal bandwidth allocated in the cycle process for guaranteed raterequesters. The resulting guaranteed rate bandwidth may be de-ratedagain by a secondary de-rate parameter to allow for additional deadlinesafety margin. The result is the maximum admission bandwidth for theaggregate of all guaranteed rate requests. Guaranteed rate requesterscan then be admitted until they have consumed the entire guaranteed rateadmission bandwidth.

[0069] Storage characterization for admission control is summarized inFIG. 12. A synthetic load is created by allocating blocks in a zonedrandom manner so that sequential file block allocations are chosen fromrandom positions within a zone alternating between an I/O disk zone asindicated 1202. Storage system bandwidth is determined using thissynthetic load as indicated at 1204. The determined bandwidth isde-rated by a primary de-rate parameter to provide a certain margin asindicated at 1206. The de-rated bandwidth is reduced according to theportion of the bandwidth allocated for guaranteed rate request asindicated at 1208. This portioned bandwidth may then again be de-ratedby a secondary de-rate parameter to provide extra deadline margin asindicated at 1210. The resultant bandwidth may then be used as a maximumaggregate admission bandwidth for guaranteed rate streams as indicatedat 1212.

[0070] The characterization process may also include determiningappropriate buffer ring sizes for various stream rates across thestorage system's desired operational range. The optimum number ofbuffers for a buffer ring may be determined for a variety of streamrates as follows. Referring to FIG. 13, for each particular stream rate,the characterization routine creates enough stream simulators to consumethe entire aggregate throughput of the storage system as indicated at1302. For each stream simulator, a ring buffer is modeled as indicatedat 1304. Each stream simulator then generates block requests alternatingbetween random blocks between zones as indicated at 1306. The simulatedstreams are then run until a completion of a test time or until any oneof the streams suffers an underrun. An underrun occur when a bufferrequest is not completed before the request deadline. In a preferredembodiment, a prefill margin parameter may be set so that an underrunoccurs if a buffer request is not completed within the prefill margintime before the request deadline. The number of ring buffers in themodel may be adjusted and the simulation repeated as indicated at 1308and 1310 until the correct ring buffer size is obtained. The entiresimulation may then be repeated for a different stream rate as indicatedat 1312. Thus a table of appropriate ring buffer sizes may beconstructed during characterization for a variety of stream rates up tothe maximum streams rates supported by the system. During operationwhenever a new stream is admitted, an appropriately sized ring buffermay be created for the new stream by accessing this table.

[0071] The performance of the video storage manager may be tuned byadjusting a number of parameters as discussed above. These parametersare summarized in the following table. TABLE 1 System CharacterizationParameters Parameter Comments primaryDerate Adjusts operational loadlevel of storage systems relative to the maximum throughput. That is,adjusts service time (queue lengths) for storage system load at whichbuffer rings are sized. available I/O rate Specifies storage bandwidthreserved for metadata and available I/O. secondaryDerate Reducesstreaming bandwidth to allow for additional deadline safety margin.prefill margin Specifies deadline safety margin. Note - secondaryDerateobtains underrun protection at the cost of potential streamingbandwidth; prefill margin obtains underrun protection at the cost ofadditional buffer memory. ioOverlap Specifies the target number of I/Orequests kept queued with the operating system in the see reorderbuffer. ioOverlap trades off seek efficiency against service timevariability. (Higher service time variability requires more memory forbuffers.) blockSize Specifies block size. blockSize trades off seekamortization against buffer fragmentation at lower stream rates.

[0072] These parameters may be used to configure and adjust theperformance of a media storage system such as the system describedabove. The maximum sustainable throughput of the storage system may becharacterized as described above, such as by using a synthetic load. Inorder to adjust the operation load level of the storage system relativeto the maximum throughput, the characterized maximum sustainablethroughput may be derated by the primary derate parameter. The primaryderate parameter is configurable and may be set during systemconfiguration. Queues, such as the deadline queues described above, maybe sized based on the derated maximum throughput as derated by theprimary derate factor. The resultant throughput may be called theprimary throughput. This primary throughput may be used for sizing thebuffer rings as described above. The primary derate parameter provides asafety margin for the operational load level of the storage system atthe expense of lowering the available maximum throughput. By setting theprimary derate parameter during system configuration, the user mayadjust this trade off as needed for any particular application of thestorage system.

[0073] The available I/O rate parameter specifies the storage bandwidthreserved for non-rate guaranteed requests, as discussed above in regardto the bandwidth allocator. The amount of bandwidth reserved fornon-guaranteed-rate requests versus guaranteed rate requests may beconfigured using this parameter. Depending upon a system's needs, theuser may adjust the proportioning between non-guaranteed and guaranteedrate requests by adjusting this available rate parameter.

[0074] The secondary derate parameter reduces bandwidth available forrate guaranteed streams. The primary throughput is proportionedaccording to the available rate parameter and the proportion allocatedfor rate guaranteed streams is further reduced by the secondary derateparameter to provide additional deadline safety margin. During operationadditional streams may be admitted up to the point that the aggregate ofall stream rates entirely consumes the portion of the primary throughputallocated to guaranteed rate streams as derated by the secondary derateparameter.

[0075] The prefill margin parameter specifies a deadline safety marginused during the calculation of buffer ring sizes. During systemconfiguration buffer ring sizes may be calculated for various streamrates, such as described in regard to FIG. 13. The prefill marginparameter specifies a margin by which the deadlines must be met duringthis buffer ring size calculation process, e.g., the prefill marginprovides a margin by which buffer underrun must be avoided when thebuffer ring sizes are being determined. Note that the prefill marginparameter obtains additional underrun protection at the cost ofadditional memory used for larger ring buffers. A larger prefill marginwill result in larger ring buffer sizes since, for certain stream rates,additional buffers will be required in the buffer ring to avoid missingthe requests' deadlines by the specified prefill margin. In contrast,the secondary derate parameter obtains additional underrun protection atthe cost of potential bandwidth for rate guaranteed streams. Thus, thesecondary derate parameter and prefill margin parameter provide a userof the storage system with the capability to adjust the systemperformance by making several different tradeoffs as is optimum for aparticular application. For example, if plenty of memory is available,but additional bandwidth is needed, then the secondary derate may belowered and the prefill margin increased. However, if memory is at apremium, the prefill margin may be decreased and the secondary derateparameter increased.

[0076] The I/O overlap parameter (also referred to as the seek reorderbuffer length parameter) specifies the number of storage requests queuedwith the operating system for a storage unit. For example, in the systemdescribed above, a seek reorder queue is used to queue requests to thestorage units in an order according to the physical disk address of thestorage requests. The length of such a queue may be configured by theI/O overlap parameter. This parameter trades off seek efficiency againstservice time variability. For example, the larger the seek reorder queueis made, the more requests may be presented to the storage unit in alinear order thus increasing drive seek efficiency. However, since therequests are reordered from their deadline and priority orderings, alonger seek reorder queue length will increase the variability inmeeting request deadlines. This parameter may be taken into account whensizing the buffer rings such that larger seek reorder queue sizes mayresult in larger buffer ring sizes to account for the variability insatisfying request deadlines. Therefore, the I/O overlap parameter mayallow the user to trade off memory that must be made available forbuffers versus higher drive seek efficiency.

[0077] In one embodiment the block size by which media data is accessedon the storage units may be configured according to a block sizeparameter. Configuring the block size may allow for trading off seekamortization against buffer fragmentation at lower stream rates. Alarger block size may allow for greater seek efficiency, however, alarger block size may also result in more fragmentation and lessefficient use of storage capacity for certain file sizes.

[0078] Moving now to FIG. 14, a flow chart illustrating one embodimentof a method of scheduling disk access requests in a traversal list forone embodiment is shown. In order to schedule a disk access request, asuit able disk head traversal must be found. A search of the traversallist is initiated in step 2001. When searching a disk head traversal, adetermination must be made as to whether the traversal being searched isactive (Step 2002). If the traversal being searched is active, acomparison of the current disk block address (i.e. the current addressof the disk head) is compared to the address of the disk request. Theaddress of the disk request must be beyond the current disk blockaddress with respect to the specified direction of the disk headtraversal. This requirement may help minimize disk head motion, as itmay prevent the disk head from having to change direction during a giventraversal. If the address of the disk request is not beyond the currentdisk block address, then the search process must begin again at step2001.

[0079] If the disk block address is beyond the address of the disk head(Step 2004), or the currently searched traversal is not active(active=false, Step 2002), the search algorithm then looks at the numberof disk requests in the disk request list (Step 2003). Prior tobeginning the search algorithm, a maximum number N of disk requests perdisk head traversal is specified. By limiting the number of diskrequests per disk head traversal, the response time for a given diskaccess request may be effectively bounded. This may allow for relativelyuniform disk access times, which may be required for certainapplications (particularly multimedia applications). Typical values of Nare between 8 and 10 requests per traversal, although the value of N maybe changed to suit various embodiments. Large values of N typicallyresult in greater optimization of disk head motion, although disk accesstimes may be less uniform. Conversely, smaller values of N may allow formore uniform disk access times, with less optimization of disk headmotion.

[0080] If the number of disk requests in the currently searchedtraversal has reached the specified maximum value N, a new traversalmust be searched. A determination is made to check if all traversalshave been searched in Step 2005. If all disk head traversals on thetraversal list have been searched without finding a suitable locationfor the disk access request, two new disk head traversals areconstructed and appended to the end of the traversal list (Step 2006).The disk request list of each of the newly constructed disk headtraversals is empty, and thus a subsequent search may easily find asuitable location for a disk request access. Various embodiments of thesystem constrain the number of disk head traversals to be even, with thedirection of disk head motion alternating with each subsequenttraversal. Thus, the first newly constructed traversal to be appended tothe traversal list may specify a direction of head motion opposite ofthe previous disk head traversal. The second newly constructed traversalto be appended may specify a direction of disk head motion opposite thatof the first newly constructed traversal. Following the appending of thetwo newly constructed traversals, searching resumes with Step 2001.

[0081] Once a suitable disk head traversal is found, the disk accessrequest may then be entered into the traversal's disk request list (Step2007). The disk request may be entered in a location on the list toallow disk head motion to continue in a single direction during thetraversal. This may require re-arranging some of the disk accessesalready entered into the disk request list. Following this, a variableindicating the number of disk requests in the disk request list isincremented (2008). Another variable indicating the total number of diskrequests for all disk head traversals is also incremented (Step 2009).

[0082] Other embodiments using different methods of reordering the diskrequests are possible and contemplated. For example, one alternateembodiment may maintain a single list of all disk requests. Each entryon the list corresponding to a disk request may include a variableindicating the number of subsequently arriving disk requests that havebeen re-ordered to be satisfied before the original disk request. Thisvariable may have a maximum value that, once reached, may cause anyfurther disk requests to be scheduled after the original disk request.Thus, the number of disk requests that may be reordered to be satisfiedbefore an original disk request is limited, which may effectively boundthe amount of time required to satisfy the original disk request.

[0083] Turning now to FIG. 15, an example of a traversal list which maybe used for scheduling disk access requests using the method of FIG. 14is shown. As illustrated here, traversal list 2500 includes entriescorresponding to eight different disk head traversals. Each entry oflist includes a traversal number, a disk request list, an activevariable, a direction variable, and a variable indicating the number ofdisk requests for the corresponding disk head traversal. In the exampleshown, each disk request list may contain up to 10 disk requests (i.e.N=10). Each disk request entered into a disk request list includes adisk address. The disk address is the location on the disk where thedisk head is to read the data in order to satisfy the request. In theexample shown, the addresses of each entry are represented in ahexadecimal format. In general, addresses may be entered into the diskrequest list in any format suitable to the particular embodiment.

[0084] Each traversal entry also includes a direction variable. In theembodiment shown, the direction is either indicated as low-to-high (lowaddresses to high addresses) or high-to-low. As such, the disk requestsentered into the disk request list are ordered in a manner consistentwith the state of the direction variable for the given disk headtraversal. In some cases, entry of a disk request into a disk requestlist may require reordering of the previously entered requests in orderto maintain the direction specified by the state of the directionvariable. For example, if a disk request for disk address 1A2F is to beentered in the third traversal in the list, it may be entered in thefourth position on the list. The disk request previously in the fourthposition (disk address 1FA1) may be moved to the fifth position of thedisk request list, thus maintaining the low-to-high direction specifiedfor the traversal.

[0085] Each traversal also includes a Boolean variable to indicatewhether the traversal is active. The traversal is considered active whendisk requests are carried out from its associated disk request list.When active, the active variable is set to a true state. For theremaining traversals, the active variable remains in a false state untilthe system begins to satisfy disk requests from its associated diskrequest list.

[0086] The traversal list also includes a current disk block address,which indicates the current position of the disk head with respect tothe disk. In the example shown, the current disk block address is 0110.This corresponds to the first disk access request of the first traversalon the list. The current disk block address may be used by thescheduling algorithm when attempting to schedule a disk request in anactive traversal. For example, if attempting to schedule a disk accessrequest in an active traversal with a specified direction oflow-to-high, the disk request must be for a higher address than thecurrent disk block address in this embodiment. If the address of thedisk request does not meet this requirement, it may then be scheduled toa non-active traversal.

[0087]FIG. 16 is a flow chart illustrating a method of executing thedisk access requests scheduled using the method in FIG. 14. Step 3000begins with a check of the total number of all disk requests scheduledin the traversal list. If no disk requests are scheduled, the systementers a wait state (Step 3001), remaining idle until at least one diskrequest is scheduled. If the number of disk request lists is greaterthan zero, then the system will check the disk request list of the firstentry of the traversal list (Step 3003). If the disk request list isempty (which may indicate that all disk requests of traversal have beenperformed), the entry corresponding to traversal is removed from thetraversal list, and the active variable is set to false (Step 3008).Next, the variable indicating the number of disk requests on thetraversal's disk request list is set to zero (Step 3009). Finally, thetraversal is appended to the end of the traversal list (Step 3010).

[0088] If the check performed in Step 3003 indicates that the diskrequest list is not empty, a check is made to see if the active variableis true (Step 3004). Typically, if no disk requests have been performedfor the current traversal, the active variable will be false, and thusmust be set to true (Step 3005). With the active variable true, the nextdisk request is removed from the disk request list in preparation forperforming the disk access (Step 3006). Following the removal of thedisk request from disk request list, the variable indicating the totalnumber of disk requests for all entries of the traversal list isdecremented (Step 3007). Finally, in step 3011, a disk access isperformed, thereby satisfying the disk access request.

[0089] While the present invention has been described with reference toparticular embodiments, it will be understood that the embodiments areillustrative and that the invention scope is not so limited. Anyvariations, modifications, additions, and improvements to theembodiments described are possible. These variations, modifications,additions, and improvements may fall within the scope of the inventionsas detailed within the following claims.

What is claimed is:
 1. A disk storage system comprising: a disk forstoring data; a disk head for reading said data from said disk; and ascheduler for receiving a plurality of disk access requests, whereinsaid scheduler is configured to schedule disk access requests to a firsttraversal of said disk head, and, in response to determining that atotal number of N requests have been scheduled to said first traversal,to schedule remaining disk access requests to one or more additionaltraversals of said disk head.
 2. The disk storage system as recited inclaim 1, wherein said disk storage system is configured to maintain alist of disk head traversals, said list including a plurality ofentries.
 3. The disk storage system as recited in claim 2, wherein eachof said plurality of entries includes a disk request list, wherein saiddisk request list includes disk access requests for an associated diskhead traversal.
 4. The disk storage system as recited in claim 3,wherein each of said plurality of entries includes a variable forindicating a number of disk access requests in said disk request list.5. The disk storage system as recited in claim 2, wherein each of saidplurality of entries includes a Boolean variable for indicating whetheran associated disk head traversal is active.
 6. The disk storage systemas recited in claim 5, wherein said Boolean variable is true when saidassociated disk head traversal is active.
 7. The disk storage system asrecited in claim 2, wherein each of said plurality of entries includes avariable indicating a direction of motion for said disk head for anassociated disk head traversal.
 8. The disk storage system as recited inclaim 7, wherein the direction of motion of said first traversal of saiddisk head is in a direction opposite the direction of motion for asecond traversal of said disk head, wherein said second traversal ofsaid disk head immediately follows said first traversal of said diskhead.
 9. The disk storage system as recited in claim 8, wherein saidlist of disk head traversals includes an even number of entries.
 10. Thedisk storage system as recited in claim 1, wherein said disk storagesystem is configured to maintain a variable indicating an address ofsaid disk head.
 11. A method of scheduling disk access requests in adisk storage system, the disk storage system including a disk forstoring data and a disk head for reading data, the method comprising:scheduling a plurality of disk access requests to a first traversal ofsaid disk head; and scheduling remaining disk access requests to one ormore additional traversals of said disk head in response to determiningthat a total number of N requests have been scheduled to said firsttraversal of said disk head.
 12. The method as recited in claim 11further comprising maintaining a list of disk access requests scheduledfor execution during a traversal of said disk head.
 13. The method asrecited in claim 12, wherein said method includes determining if saidfirst traversal is active.
 14. The method as recited in claim 13,wherein said method includes determining a current address of said diskhead.
 15. The method as recited in claim 12, wherein said methodincludes maintaining a traversal list, said traversal list including aplurality of entries corresponding to said disk head traversals.
 16. Themethod as recited in claim 15, wherein each of said plurality of entriesincludes a Boolean variable for indicating if a corresponding disk headtraversal is active.
 17. The method as recited in claim 16, wherein saidBoolean variable is true when corresponding disk head traversal isactive.
 18. The method as recited in claim 15, wherein each of saidplurality of entries includes a variable for indicating a direction ofdisk head motion during a corresponding disk head traversal.
 19. Themethod as recited in claim 18, wherein the direction of disk head motionof a second traversal is opposite of the direction of disk head motionof said first disk head traversal, wherein said second disk headtraversal immediately follows said first disk head traversal.
 20. Themethod as recited in claim 19, wherein said traversal list includes aneven number of entries.
 21. The method as recited in claim 15 furthercomprising the scheduling of two new disk head traversals if a suitabledisk head traversal is not found for a disk access request, wherein saidtwo new disk head traversals are appended to the end of said traversallist.
 22. The method as recited in claim 14, wherein a variable ismaintained indicating the total number of disk access requests for allof said plurality of entries of said traversal list.
 23. A disk storagesystem comprising: a disk for storing data; a disk head for reading saiddata from said disk; and a traversal list including a plurality ofentries, wherein each of said entries includes a list of disk requestsfor a traversal of said disk head, and wherein said list of diskrequests may include up to N disk access requests.
 24. The disk storagesystem as recited in claim 23, wherein each of said plurality of entriesincludes a variable to indicate the direction of motion of said diskhead during a traversal of said disk head.
 25. The disk storage systemas recited in claim 24, wherein said traversal list includes entriescorresponding to a first traversal and a second traversal, wherein saidfirst traversal has a direction of motion opposite of said secondtraversal, and wherein said second traversal immediately follows saidfirst traversal.
 26. The disk storage system as recited in claim 25,wherein said traversal list includes an even number of entries.
 27. Thedisk storage system as recited in claim 23, wherein each of saidplurality of entries includes a Boolean variable for indicating whethera traversal is active and a variable indicating a number of diskrequests in said list of disk requests.
 28. A method of scheduling diskaccess requests in a disk storage system, said disk storage systemincluding a disk head for reading data from a disk, the methodcomprising: maintaining a list of disk access requests scheduled forexecution during a traversal of said disk head; determining from saidlist whether a number of disk access requests schedule for a firsttraversal of said disk head has reached a maximum number (N); andscheduling said additional disk access requests to additional traversalsin response to determining said number of disk requests scheduled forsaid first traversal has reached said maximum number (N).
 29. The methodas recited in claim 28, wherein said method includes reading a Booleanvariable to determine if said first traversal is active.
 30. The methodas recited in claim 29, wherein said method includes determining theaddress of said disk head in response to determining that said firsttraversal is active.
 31. The method as recited in claim 28, wherein alist of traversals is maintained, said list of traversals including aplurality of entries corresponding to traversals of said disk head. 32.The method as recited in claim 31, wherein each of said plurality ofentries includes a variable for indicating the direction of motion of acorresponding disk head traversal.
 33. The method as recited in claim32, wherein a direction of motion for said first traversal is oppositeof the direction of motion for a second traversal, and wherein saidsecond traversal occurs immediately after said first traversal.